Perfect simulation for locally continuous chains of infinite 

order 



Sandro Gallo 

Instituto de Matemdtica, Universidade Federal de Rio de Janeiro, Brazil 

Nancy L. Garcia* 

Instituto de Matemdtica, Estattstica e Computagdo Cienttfica, Universidade Estadual de 

Campinas, Brazil 



Abstract 

We establish sufficient conditions for perfect simulation of chains of infinite order on 
a countable alphabet. The new assumption, localized continuity, is formalized with 
the help of the notion of context trees, and includes the traditional continuous case, 
probabilistic context trees and discontinuous kernels. Since our assumptions are 
more refined than uniform continuity, our algorithms perfectly simulate continuous 
chains faster than the existing algorithms of the literature. We provide several 
illustrative examples. 
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1. Introduction 

The objects of this paper are stationary stochastic chains of infinite order on a 
countable alphabet. These chains are said to be compatible with a set of transition 
probabilities (depending on an unbounded portion of the past) if the later is a regular 
version of the conditional expectation of the former. This refiects the idea that 
chains of infinite order are usually determined by their conditional probabilities with 
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respect to the past. Given a set of transition probabilities (or, simply kernels in the 
sequel), two natural questions are (1) existence: does there exist a stationary chain 
compatible with it? And (2) uniqueness: if yes, is it unique? A constructive way to 
answer positively these questions is to provide an algorithm based on the transition 
probabilities which converges a.s. and samples precisely from the stationary law 
of the process compatible with the given kernel. This is precisely the focus of this 
paper. 



Perfect simulation for chains of infinite order was first done by Comets et al. (2002) 



under the continuity assumption. They used the fact, observed earlier by [Kalikow 



(1990), that under this assumption, the transition probability kernel can be decom- 



posed as a countable mixture of Markov kernels. Then, Gallo (2011) obtained a 
perfect simulation algorithm for chains compatible with a class of unbounded prob- 
abilistic context trees where each infinite size branch can be a discontinuity point. 
Both of them use an extended version of the so-called coupling from the past al- 



gorithm (CFTP in the sequel) introduced by Propp & Wilson (1996) for Markov 



chains. Recently, Gallo & Garcia (2010) proposed a combination between these 
algorithms to cover cases where the kernels are neither necessarily continuous nor 
necessarily probabilistic context trees. In the present paper we consider a broader 
class of kernels that includes all the above cases, in fact, all the results of the above 
cited works can be obtained as corollaries of the present work. 



Other recent results in the area are the papers of Garivier (2011) and De Santis & 



Piccioni (2012). The former introduced an elegant CFTP algorithm which works 



without the weak non-nullness assumption, designed a la Propp & Wilson (1996). 
Their work does not intend to exhibit explicit sufficient conditions for a CFTP to 
be feasible and has a more algorithmic-motivated approach. The later introduced 
an interesting framework, making use of an a priori knowledge about the histories, 
extracted from the auxiliary sequence of random variables used for the simulation. 
Their general conditions are not explicitly given on the kernel, difficulting the com- 
parison with our method. Notice, however, that our result is a strict generalization 



of Theorem 4.1 in Comets et al. (2002) whereas the regime of slow continuity rate is 
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not present in their paper. To be more transparent, we show that all the examples 



of De Santis & Piccioni (2012) satisfy our conditions when considering the weakly 
non-null cases. 

Let us emphasize also that discontinuities appear quite naturally. Section [5] present 
several examples. On the other hand, relaxing the continuity assumption has an 
interest not only from a mathematical point of view, but also from an applied point 
of view. Practitioners generally seek to build models which are as general as possible. 
From data, it is not possible to check the rate of decay of the dependence on the 
past, and therefore, we do not know if we have continuity. 

One of the main concepts introduced in this paper is the notion of skeleton related 
to a transition probability kernel. It is the smallest context tree composed by the 
set of pasts which have a continuity rate which converges slowly to zero, or even 
which does not converge to zero (discontinuity points). This concept is reminiscent 
of the concept of bad pasts, meaning the set of discontinuous pasts for a given two- 
sided specification, which appears in the framework of almost- Gibbsianity in the 



statistical physics literature. We refer to van Enter et al. (2008) for a discussion 
on the subject. Almost-Gibbs measures appear in several situations, for example 



random walk in random scenery (see for example, den Hollander et al. (2005) and 



den Hollander & Steif (2006)), projection of the Ising model on a layer (Maes et al. 



(1999)), intermittency (Maes et al. (2000)), projection of Markov measures (Cha- 



zottes & Ugalde ( 2011[ )). From this point of view, our work exhibits a large class of 



almost-Gibbs measures that can be perfectly simulated. 



Our first main result. Theorem 4.1, deals with locally continuous chains. Local con- 
tinuity corresponds to assume that there exists a stopping time for the reversed-time 
process, beyond which the decay of the dependence on the past occurs uniformly. 



Theorem 4.1 states that, if the localized-continuity rate decays fast enough to zero, 
we can perfectly simulate the stationary chain by CFTP. More precisely, according 
to this rate, we specify several regimes for the tail distribution of the coalescence 



time. Theorem 7.1 presents an interesting extension where we remove the local con- 
tinuity assumption. This means that this later result deals with chains such that no 
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stopping time can tell whether or not the past we consider is a continuity point for 
P. Here also, we give explicit examples, motivating these two theorems. 

It is important to emphasize that these results not only enlarge the class of processes 
which can be perfectly simulated but also they can be interpreted as a method to 
"speed up" the perfect simulation algorithm proposed by Comets et al. ( |2002 ). 



Assume that the kernel is such that their algorithm can be performed (that is, is 
continuous, with a sufficiently fast continuity rate), but with an infinite expected 
time, due to some pasts which slow down the continuity rate. Our method allows 
us to include these pasts into the set of infinite size contexts of the skeleton. Then, 
depending on the position of these branches (that is, depending on the form of the 
skeleton), our results show that the perfect simulation might be done in a finite 
expected time. 

Our perfect simulation algorithm for the given kernel P requires that the skeleton 
is itself perfectly simulable. However, this apparent handicap is easy to overcome. 
Sufficient conditions for perfect simulability can be explicitly obtained for a wide 
class of skeleton context trees. Several examples are presented and used throughout 
the paper. 



It is worth mentioning that Foss & Konstantopoulos (2003) showed that the notion 



of perfect simulation based on a coupling from the past is closely related to the 
almost sure existence of "renovating event". However, the difficulty always lies 
in finding such event for each specific problem. In the present case, the perfect 
simulation scheme provides such renovating event and gives conditions for almost 
sure occurrence in terms of the transition probability kernel. 

The paper is organized as follows. In Section [2] we present the basic definitions, 
the notation, and we introduce the coupling from the past algorithm for perfect 
simulation in a generic way. Section |3] introduces the more specific notions of local 
continuity and skeletons, that are fundamental for our approach to perfect simu- 



lation. Our first main result on perfect simulation (Theorem 4.1) is presented in 



Section 111 together with the corollaries of existence, uniqueness and regeneration 
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scheme which are directly inherited by the constructed stationary chain. Discussion 
of these results and explicit examples of application are given in Section |5| The 
proof of Theorem 4.1 is given in Section [6j Section [7] is dedicated to an extension 



of Theorem 4.1 We finish the paper with some concluding remarks in Sectional 



2. Notation and basic definitions 

Let A be a countable alphabet. Given two integers m < n, we denote by aj^ the 
string Qm ■ ■ - dn of symbols in A. For any m < n, the length of the string is 
denoted by \a^\ and defined hj n — m + 1. We will often use the notation which 
will stand for the empty string, having length |0| = 0. For any n G Z, we will use the 
convention that a^^^i = 0, and naturally = 0. Given two strings v and v', we 

denote by vv' the string of length |f | + \v'\ obtained by concatenating the two strings. 
If v' = 0, then f = 0f = v. The concatenation of strings is also extended to the 
case where v = . . . a_2a-i is a semi-infinite sequence of symbols. If n G {1, 2, . . .} 
and f is a finite string of symbols in A, v"' = v . . .v is the concatenation of n times 
the string v. In the case where n = 0, is the empty string 0. Let 

+00 

^-N ^ ^{...,-2,-1} A^ =[j A^-^' -'-'^ , 

j=0 

be, respectively, the set of all infinite strings of past symbols and the set of all finite 
strings of past symbols. The case j = corresponds to the empty string 0. Finally, 
we denote by a = . . . a_2Ct-i the elements of A~^. 

Along this paper, we will often use the letters u, v and w for (finite or infinite) 
strings of symbols of A, and the letters i, j, k, I, m and n for integers. 

Definition 2.1. A transition probability kernel (or simply kernel in the sequel) on 
a countable alphabet A is a function 

P: AxA-^ ^ [0, 1] 

(1) 

(a, a) I— 7- P{a\a) 
such that XlaeA -^('^1^) ~ -'- /^^ '^''^V - ^ 



For any a e A, we define 

a{a) :— mi P{a\z) and q;_i := > a{a). 

Definition 2.2. We say that the kernel P is weakly non-null if a-i > 0. 

Notice that, a given kernel P is Markovian of order k if P{a\a) — P{a\b) for any a 
and b such that aZ]. = bzl- 

A given kernel P is continuous (with respect to the product topology) at some point 
a if P{a\aZl:Z) — >■ P{a\q) whenever k diverges, for any z. Continuous kernels are a 
natural extension of Markov kernels. 

In this work we will need an equivalent definition of continuity. 

Definition 2.3. We say that an infinite sequence of past symbols b is a continuity 
point (or past) for a given kernel P if the sequence {ujk{b)}k>i defined by 

uJk{b):=J2^niP{b\bzlz), k>l, (2) 

beA ~ 

converges to 1, and a discontinuity point otherwise. We say that P is (uniformly) 
continuous if the sequence {ujk}k>i defined by 

ujk := inf J2 ^i^Kl ^) ' ^ ^ 1' (3) 
converges to 1, and discontinuous otherwise. 

If the alphabet is finite, for instance, the set is compact and therefore, our 
uniform continuity is equivalent to asking that every point is continuous by the 
Heine-Cantor Theorem. But this is not the case in general since we are not assuming 
finiteness of the alphabet. 

We now introduce the objects of interest of the present paper, which are the sta- 
tionary chains compatible with a given kernel P. 
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Definition 2.4. A stationary stochastic chain X = (X„)„gz of law fi on is said 
to be compatible with a family of transition probabilities P if the later is a regular 
version of the conditional probabilities of the former, that is 

/i(Xo = a\Xzl = a-_l) = P(a|a:^) (4) 

for every a & A and /i-a.e. al^ in A~^. 

Standard questions, wlien we consider non-Markovian kernels, are 

Ql. Does there exist a stationary chain compatible with P? 
Q2. Is this chain unique? 

These questions can be answered using the powerful constructing method of "perfect 
simulation via coupling from the past" . For a stationary stochastic chain, an algo- 
rithm of perfect simulation aims to construct finite samples distributed according 



to the stationary measure of the chain. Propp & Wilson (1996) introduced (in the 



Markovian case, that is, in the case where P is a transition matrix) the coupling 
from the past (CFTP) algorithm. This class of algorithms uses a sequence of i.i.d. 
random variable U = {f/jjiez, uniformly distributed in [0, 1[, to construct a sample 
of the stationary chain. 

From now on, every chain will be constructed as a function of U and therefore, 
the only probability space {Q, J-", P) used in this paper is the one associated to this 
sequence of i.i.d. r.v.'s. 



A CFTP algorithm is completely determined by the update function F, with which 
are constructed a coalescence time 6 and a reconstruction function $. 

The update function F : A^^UA* x [0, 1[— ?■ A has the property that for any a G A^^ 
and for any a E A, P(P(a, Uq) = a) = P{a\a). Define the iterations of F by 

Fik,i]{a, Ul) = F (aP[,,,](a, Uk)F[k,k+i]{a, f/^^) . . . F^k,i-i]{a, Ut'), Ui) , (5) 
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for any — oo < k < I < +00, where F[k,k]{(i,Uk) = F{a,Uk). Based on these itera- 
tions, we define, for any window {m, . . . ,n}, —00 < m < n < +00, its coalescence 
time as 

9[m,n] := max{j < m : F[j^„](a, f/") does not depend on a}, (6) 
with d[n\ := 6[n,n]. Finally, the reconstruction function of time i is defined by 

[$(U)]. = F[,[,],](a,f/^[,j). (7) 

Given a kernel P, if we can find an F such that 6[m,n\ is a.s. finite for any 
— 00 < m < n < +00, then, the reconstructed sample [$(U)]i, i = 6[m, n], . . . ,n is 
distributed according to the unique stationary measure. A well-known consequence 
of this constructive argument is that there exists a unique stationary chain compat- 



ible with P, answering questions Ql and Q2 at the same time (see Corollary 4.1 
below). Observe that the choice of the function F is crucial in this approach, a "bad" 
choice could lead to a coalescence time which is not a.s. finite or having heavy tail 
distribution with no finite expectation. But this choice depends on the kernel, and 
in particular, according to the assumptions made on the kernel, we might guarantee 
that there exists a F for which 6[m, n] is a.s. finite. Another important observation 
is that, a priori, such algorithms are not practical in the sense that, at each steps, 
it requires that we generate all the pasts a. For this reason, a particular update 
function based on a length function (see ([22))) will be defined, allowing to decide, 
for some pasts a and some values of Uq, what is the value of F{a, Uq) looking only 
at a finite portion of a. 

3. Local continuity and good skeleton 

Definition 3.1. •A context tree on a given alphabet A is a subset of A'^UA* 
which forms a partition of A~^ and for which if v E r, then uv ^ t for any 

u e A-^yjA\ 

• For any context tree t, we denote by ^°°t the set of contexts of r having 
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finite lengths and by °°t the remaining contexts. Clearly, these subsets form a 
partition of t. 



• For any past a & A ^ we denote by Cr{a) the unique element of t which is 
suffix of a. 

For our purposes, a particular class of context trees on A will be of interest. 

Definition 3.2. A context tree is a skeleton context tree (or simply skeleton) if it 
is the smallest context tree containing the set of its infinite length contexts. We also 
consider ^ to be a skeleton. 

In order to illustrate this notion, let us give one simple example, on A = { — 1, +!}• 
Let 

r := {1, 1(-1), 1(-1)(-1), 1(-1)(-1)(-1), . . .} U {-!}. (8) 

and 

r, := {11,(-1)1,11(-1),(-1)1(-1),11(-1)(-1),(-1)1(-1)(-1), (9) 
11(-1)(-1)(-1), (-1)1(-1)(-1)(-1) . . .} U {-!}. 

Observe that r and Tg are indeed context trees (i.e. satisfy the requirements of the 



first item of Definition 3.1). However, the only infinite length context in both trees 
is —1, and it is easy to see that r is the smallest context tree having this unique 
infinite length context. Therefore, r is a skeleton whereas Tg is not. We will come 
back several times to this skeleton along the paper. 

The reason why we introduced skeletons is that they give us a nice way to formal- 
ize the notion of localized continuity, which extend the continuity assumption. In 
Section |5] this notion is explained by mean of examples. 

Definition 3.3. A kernel P belongs to the class of locally continuous kernels with 
respect to the skeleton r if for any v G ^°°r, the sequence {al}k>o defined by 



inf y^miP{a\vaZlz) , k>0 (10) 

- 1 ^ Ak < ^ Z 



converges to 1. We will denote this class by LC(r). The probabilistic skeleton (p.s.) 
of P is the pair {t,p) where p := {p{a\v)}aeA,ve<°°T , 

p{a\v) := inf P{a\vz) (11) 

z 

and p{a\v) = P{a\v) for any v G °°r. 
Some observations on the above definitions. 
Observation 3.1. 

1. If P is LC(r) then, all pasts a such that |cr(a)| < +oo are continuity point 
for P. On the other hand, we require nothing on the points a such that 
|cr(c[)| = oo- In practice, we will see later that these will be the pasts with 
slow continuity rate (or even the discontinuous pasts). 

2. Observe that for any fixed v G ^°°r, {p{a\v)}aeA needs not to be a probability 
distribution on A. 

Our first main assumption for our results will be that P is a probability kernel on 
A = {1, 2, . . .} being locally continuous with some p.s. {t,p). We will furthermore 
require that this p.s. is "good" in a sense we explain now. We first introduce 
sequences of random variables which are obtained as coordinatewise functions of 
the sequence U. The first sequence, Y is defined as follows, for any i E Z 

$^«(j) <U,< 5^a(j) \+^.l{Ui > (12) 

j=0 j=0 J 

where a(0) := 0. For any a = al^ G A^, we also define the sequence of r.v.'s Y(a) 
where for any i G Z, 

l^,(a) := Ya{Yi E A} + ai.l{Y^ = 

Finally, we define the sequence {c"}„gz of maximum context length (based on Y) 
as 

<:=sup|c.(F"^(a))|, n e Z, (13) 



where the notation €,-(■) was introduced in Definition 3.1 Observe that the event 
{c" < k} is J-'(K"_fc_|_i)-measurable. 
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Definition 3.4. Any time belonging to the set 

{i < m : Yj E A or < j — i , j = i, . . . , m} (14) 
is called a good coalescence time for time m. We say that {t,p) is a good probability 



skeleton if 9[0], the supremum over the set (14) when m = 0, has finite expectation. 



Observation 3.2. 

1. Any good coalescence time is measurable with respect to J-'(y!^) (less infor- 
mation than 

2. Assuming that (r, p) is a good p.s. of P implies that there exists a set A{t) C A 
such that inf^ -P(a.U) > for all a G A{t). In other words, this means that we 
assume weak non-nullness for P. 

3. Observe that, under the assumption that r is a good p.s., we have c{. < +oo, P- 
a.s. for any j G Z. 

4. Main result and direct consequences 

We will say that a non-negative sequence {c„}„gN decays exponentially fast to zero 
if there exist a constant D > and a real number < d < 1 such that c„ < Dd"' 
for any n. We say that a real-valued random variable W has exponential tail if 
{P(|14^| > decays exponentially fast to zero, and summable tail if {P(|14^| > 

^)}neN is summable. 
For any skeleton r, let 

N{t) := {i > 1 : 3t; G r, |f | = i]. 

In the sequel, one of the main characteristics of a kernel in LC(r) will be the sequence 
of sequences {{afc}fc>-i}ieAr{r)5 defined as 

4 := inf ftfc, (15) 

where -V denotes the set of contexts in r having length smaller or equal to i. 
Observe that there is a notational similarity between the case where the exponent 
is an integer i and the case where it is an element f of r. 
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Theorem 4.1. Consider a kernel P belonging to LC(t) and assume that its prob- 
abilistic skeleton {t,p) is good with good coalescence times 6[0] for time 0. Let 
Aq := «_! and for any k > 1, denote 

Ak := {l - (E|^[0]| + l)F{Uo > af)} V 

Then, we can construct for P, an update function F and a corresponding coalescence 
time 9 such that 

(i) IfYl,k>iY^=l^k = +00, then 9[0] is F-a.s. finite. 

(ii) // X]a:>o(-'^ ~ ^i') ^ +00, then 6[0] has summable tail. 

(iii) If 0[0] has exponential tail and {1 — Afc}fc>o decays exponentially fast to zero, 
then 9[0] has exponential tail. 

In particular, in each of these regimes, the CFTP with update function F is feasible. 

The proof of this resuh is given in Section[6j Section [5] will discuss explicit examples. 
We now state some direct consequences of Theorem |4.1[ 

Theorem |4.1| states, in particular, that the CFTP algorithm is feasible with some 
function F (which will be constructed in the proof). We recall that this means that 
the algorithm constructs, for any i G Z, an almost surely finite sample [$(U)]g[^], 
which is a deterministic function of U. In the sequel, we will often write Xi for 
[$(U)]j (and X for [$(U)]t^) in order to avoid overloaded notations, keeping in 
mind the fact that for any i, Xj is constructed as a deterministic function of U. 



Actually, by Theorem |4.1[ Xi depends only on a P-a.s. finite part of this sequence 
since Xi := [X{. . . ■ ■ ■ ,Ui, Ui+i, . . .)]i for any u G [0, Ip. As we already 

mentioned in Section |2| the existence of a perfect simulation algorithm has important 
consequences, stated in the two next corollaries, and whose proofs use standard 



arguments given, for example, in Comets et al. (2002). 



Corollary 4.1. (Existence and uniqueness). X is stationary and compatible with 
P. Denoting by fi the stationary measure o/X we have 

/i(-) :=P($(U)G-). 
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Moreover, jj, has support on the set of continuous pasts of P. 



When IE|^[0]| < +00, the chain X exhibits a regeneration scheme. We call time t 
a regeneration time for the chain X if 6[t, +00] = t. Define the chain T on {0, 1} 
by 75 := 1 {j = 9[j, +00]}. Then, consider the sequence of time indexes T defined 
by 7j = 1 if and only if j = Ti for some / in Z, T; < T;+i and with the convention 
^0 ^ < Ti. We say that X has a regeneration scheme if 7" is a renewal chain (that 
is, if the increments (Tj+i — Tj)jgz are independent, and are identically distributed 
for i ^ 0). 

Corollary 4.2. (Regeneration scheme). Under conditions (ii) and (Hi) of Theorem 



4-1 . the chain X has a regeneration scheme. The random strings 



($(11)7;., . . . , $(U)Ti+i-i)j^o ore i.i.d. and have finite expected size. Under the 
stronger requirement of (Hi), the lengths of these strings have exponential tail. 

In words, this corollary states that the unique stationary chain compatible with 



(r, under the conditions of Theorem 4.1 can be viewed as an i.i.d. concatenation 
of strings of symbols of A having finite expected size. A similar result has been 



first obtained in Lalley (1986) for one dimensional Gibbs states under appropriate 



conditions on the continuity rate, and then in Comets et al. (2002) under weaker 



conditions. Our result strengthen all these results, and in particular, since continuity 
is not assumed here, our chains are not even necessarily Gibbsian. This lack of 



Gibbsianity is easy to establish, using the recent result of Fernandez et al. (2011 ), in 



which it has been shown that the unique stationary chain compatible with the p.s. 
(21) v) iz is the tree corresponding to a regeneration process) is not always Gibbsian, 
even when it satisfies continuity and a{a) > for all a & A. 

5. Applications 



In Section |5.1| we will explain how Theorem |4.1| reads in two special cases of local 
continuity which are the strong local continuity and the uniform local continuity. 
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Then, we will present several examples that illustrate our results: in Section 5^ we 
consider local continuity with respect to two special cases of skeletons and finally, 
Section |5.3| is dedicated to explicit examples. 



5.1. Specific local continuities 
Uniform local continuity.. 

Definition 5.1. A kernel P belongs to the class of uniformly local continuous ker- 
nels with respect to the skeleton r if 



aj := inf a] 



i—>-+co 



(16) 



We will denote this class by ULC(r) 



When P belongs to ULC(r), we can use the facts that := inffcgjv(T-) converges 
to 1, and that Uq is independent of (since this later is J-'(f/r(^)-measurable) to 
obtain 

P(f/o > af) = Yl ^(^0 ^ = k)< P(f/o > «,) J2 ^(^^' = fc) < (1 - 

(17) 

We have therefore proved the following corollary. 



a,; 



Corollary 5.1. Restricting the assumptions of Theorem 4-1 to the case of ULC(t) 
kernels, the same statements hold substituting Am by 1 — E(|^[0]| + 1)(1 — am), for 
any m > 1. 



As we will explain in Section 5.2, Theorem 4.1 in Comets et al. (2002) and Theorem 



1 in Gallo & Garcia (2010) are particular cases of uniform local continuity where 



r = and r has a terminal string (see Definition 5.3), respectively. 



Explicit examples of this regime are given by Examples 5.1, 5.2|and 5.4 



Strong local continuity.. 
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Definition 5.2. A kernel P belongs to the class of strongly local continuous kernels 
with respect to the skeleton r if for any v G ^°°r, there exists a positive integer h{v) 
such that for any k > h{v), al = 1. We will denote this class by SLC(r). 

These kernels belong to a particular class of probability kernels known in the liter- 
ature under the name of probabilistic context trees, which have been introduced by 



Rissanen (1983) 



When P belongs to SLC(r) on a finite alphabet, we can use the fact that for any 
i E N{r), a\ = \ for any k > h{i), where h{i) := sup„g<i^ h{v), and we obtain, using 
h-\{} :=inf{A; > 1 : h{k) > i}, 



We have therefore proved the following corollary. 



:i8) 



Corollary 5.2. Restricting the assumptions of Theorem 4-1 to the case of SLC(t) 



kernels on finite alphabet, the same statements hold substituting by 1 — E(|^[0]|H- 
l)P(c° > h^^{m)), for any m > 1. 



As we will explain in Section 5.2, the results of Gallo ( |2011 ) are a particular case 



of strong local continuity where r has a terminal string (see Definition 5.3). Owing 



to Corollary 4.1, the compatible stationary measure has support on the set of finite 



length contexts. An explicit example of such regime is given by Example 5.5 



5.2. Specific skeletons 
Skeleton r = 



Comets et al. (2002) assumed that ^^^^mfzP{a\z) > (weak non-nullness), and 
that the sequence {ci;fc}A;>o, defined by ([3]), satisfies 

k-l 

J2ll^k = +^ (19) 

fc>l j=0 

This implies, in particular, that P is continuous. Without further information on P, 
this uniform convergence assumption gives us the possibility of using any skeleton 
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r. In order to fix ideas, we will choose r = 0, which is the simplest skeleton and we 
have P is ULC(r). Thus we have ak := cok for any A; > and p := {p{a\v)}aeA,vG<°°T 
is in fact {a{a)}aeA- In this case, since all the contexts of r = have length 0, we 
have ^[0] = 0, E|^[0]| + 1 = 1 and thus, Ak = ak = oJk- This shows that our Corollary 



5.1 retrieves the results of Comets et al. (2002). 



Skeletons with a terminal string 



Definition 5.3. We say that w is a terminal string for a skeleton r if for any 



V G 



<oo 



r we have v 



'\v\+\w\-l+i 
-\v\+i 



^W, i = l, 



\v\ — \w\ 



Proposition 5.1. Consider a probabilistic skeleton {t,p) for which r has a terminal 
context w, and p = {p{a\v)}aeA,ve<°°T satisfies infj=i^. infi,g^p(w_j|t>) = e for 
some e > 0. These p.s. 's are good and the corresponding good coalescence time has 
exponential tail. In the particular case where \w\ = 1, we have P(^[0] < —n) = 
(1 - e)", n>l. 

The proof of this proposition is immediate once we observe that 

^[0] > sup{2 < + 1 : = w}. 

We have the following for skeletons with a terminal string. 



When P is in SLC(r), Corollary 5.2 extends Theorem 5.1 of Gallo (2011) (as 



explained by Example 5.5 below). 



When P is in ULC(r), Corollary 5.1 extends Theorem 1 of Gallo & Garcia 



(2010). 



5.3. Examples 

Several examples of continuous and discontinuous kernels can be found in the lit- 



erature of perfect simulation for chains with infinite memory (we refer to Comets 



et al. 


(2002 


), 


Gallo 


(2011 


) or 
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Kernels that are locally continuous are not necessarily discontinuous neither nec- 
essarily continuous. An important aspect of the present work is that we not only 
want to consider discontinuous kernels, but also to "speed up" the CFTP algorithms 
that have been proposed in the literature in the sense that the tail distribution of 
the coalescence time ^[0] of our CFTP will decay faster to zero. In some cases, it 
will decay to zero (and therefore ^[0] will be a.s. finite) while other CFTP's of the 
literature will not be feasible since their coalescence time are not a.s. finite. 

We now present five examples on the binary alphabet A = { — 1,+!}. 
5.3.1. Using as skeleton 

Example 5.1. Our first example is the well-known binary auto-regressive processes 



(AR), which is precisely the main example presented in Comets et al. (2002). These 
models are defined using a continuously differentiable increasing function ip -.M. ^ 
]0, 1[ and an absolutely summable sequence of real numbers {6n}n>o'- 



\ fc>i / 



Va G A~ 



Such kernels are continuous since for any a, we have 



ujk{a) = inf P{l\aJ^.z) + inf P{-l\a_l.z) 

z z 
k 



and using the mean valued theorem, we have 



1 - 2ij'{c) J2 1^^ 



i>k+l 



for some real number c = c(a I) in the interval 



1=1 



i>k+l 



i=l 



i>k+l 



i>k+l 



inf ijj {00 + Y a-iOi + ^ z^i+kOi \ + I - snpip IOq + Y a^iOi + ^ 

^ \ i=l i>k+l J - \ i=l 

\ i=l i>k+l J \ i=l i>k+l ) 
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Due to the assumption that {6'„}„>o is an absolutely summable sequence of real 
numbers, we have that P is continuous in every a. Moreover, we observe that 
the rate at which ajfc(a) converges to 1 is controlled (exclusively) by the rate at 
which Tfc := X]j>A;+i converges to 0, independently of a. This is because for any 
a, i^' {c{aZ].)) converges to a positive constant (if we exclude the trivial case of if) 
constant). In other words, denoting by c := sup^'?/''(c) and C := infa?/''(c), we have 

1 - 2c ^ \ei\ < ujk{a) < 1 - 2C ^ \ei\ (20) 

i>k+l i>k+l 

showing that there is nothing to gain in using the notion of local continuity. Our 
conditions for perfect simulation of this model are, using Corollary |5.1 to the case 



where r = 0, the same as in Comets et al. (2002), as we said in the first part of 
Section 15. 2[ 

5.3.2. A continuous case for which t = ^ is not enough 

In certain conditions, Uk may increase very slowly to 1. In these cases, the algorithm 



of Comets et al. (2002) can be very slow to stop, or may not be feasible. That is, the 
coalescence time 9[0], or, roughly speaking, the random number of steps operated 
by the algorithm, may have no first moment or even may not be almost surely finite. 
For a given kernel P, a simple reason for which Uk could increase slowly to 1 is that 
some pasts a could have a very slow continuity rate. Since the definition of Uk is 
uniform on the pasts, it has to take into account these bad pasts as well. Providing 
we have the information of the position of these bad pasts, our method allows us to 
create a skeleton r for which the set of infinite size contexts is composed by these bad 
pasts, and to work separately on the problem of the resulting p.s. asking whether 



it is good or not (according to Definition 3.4). We observed that in the example of 



the AR processes, every past a has the same continuity rate, and therefore, the only 
natural skeleton was 0. We now present an example in which this is not the case. 



Example 5.2. This example is an unpublished example presented in De Santis fc 
Piccioni (2010). First, define for any a E (0, 1) and any a G { — 1, 1}~^ 



k 



1=1 



with the convention that To-(a) = +00 if the set of indexes is empty. This is the 
first time the proportion of I's is larger than a, when we look backwards in the 
sequence a. Then, consider two summable sequences {/3(«)}i>i and {7(i)}i>i, such 
that 7(z) < and three real numbers 61 G (0, 1), c > and a > satisfying 



The kernel P on {—1,1} is defined by 



P{l\a) = bJl~cJ2 {PiWa-i = -1, T^iad > ^} + = -1, ^a(a) < 

\ i>l ' 

(21) 

We observe that for pasts a such that T„[a) = 00, the continuity rate is controlled by 
{/3(z)}i>i, since ijJk{a) = 1 —bicJ2i>k+i /^(O' while for pasts a such that To-(a) < 00, 
the continuity rate is controlled by {7(^)}i>i, since ujk{a) = 1 — &icX]j>fc+i 7(0 



k > Tcj{a). Contrary to Example 5.1 , we may have something to gain in using the 
notion of local continuity, depending on the tails of the series of both sequences. As 
we will see, a natural choice for the skeleton is 



Proposition 5.2. For any a G (0,1), the p.s. {t^,p), where inf^g^tr ]9(l|t>) > a, is 
good, and has a good coalescence time 6[0] with exponential tail. 

Proof The chain Y takes value 1 with probability a and * with probability 1 — cr. By 
the definition of r°", we see that ^[0] := min{i > 1 : - > o"} is a stopping 



time in the past of the form of ( 14 ) . On the other hand, we observe that this random 



variable has the same tail distribution as ^[0] := min{z > 1 : } J2]=o ^(0)i — ^} 
since YfO) is i.i.d. Then, 



< -N) = nm >N)<f[j^Yl ^(0)^' < ^ 

\ j=0 

which decays exponentially by the well-known Chernoff hound. □ 
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Now, we notice that, for any i G N{t) 



inf y^infPl 

a-} -■.T(a)<k+i ^ 

-k-i y->- ^ aeA 



bic 7(i), 



which in turns imphes that Ok > infi>i[l - &icXlj>fc+i7(j)] > 1 - bic^jyf.lij). 
It follows from the summability of {'y{i)}i>i that P G ULC{t^). Therefore, using 



Proposition 5.2 we can apply Corollary 5.1|to show that perfect simulation can be 



done without assuming X]i>i ^7(0 < +^ (assumption required by De Santis & Pic- 



cioni 



(2010)). Moreover, depending on the rate at which &ic X]j>fc+i 70) converges 



to 0, we obtain several regimes stated in Theorem |4.1[ All this occurs independently 
of the sequence {/3(i)}i>i, which is the sequence that controls the tail of the CFTP 
Comets et q/.| ( |2002D . 



m 



5.3.3. Three examples in LC{t) 

Let C{a) denotes the first time we see a 1 when we look backward in a. Formally, 
£(—1) := +00 and for any a ^ —1, 

>C(a) := inf{i > 1 : a_i = 1}. 



Example 5.3. The following example was presented in De Santis & Piccioni (2012 ) 



(see Example 2 therein). It is a kernel belonging to LC{t_), where r is defined by 
(|8|, but does not belong to ULC{t) nor SLC{t). Let P(l| - 1) = e > 0, and for 
any a ^ —1 

P(a|a) = e + (1 - 2e) ^ l{a = a_£(a)-n}gf -\ 

n>l 

where, for any / > 1, {(lr^n>\ is a probability distribution on the integers. This 
kernel has a discontinuity along —1, in fact, we have 

Wfc = a;fc(-l) = inf P(l| - \z\z) + 1 - supP(l| - \-_\z) 

^ z 

= e + l-e-(l-2e) = 2e< 1. 
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On the other hand, it belongs to LC (r) since for any j > 1 



«1 



inf inf > inf Pi 

.iC<.7V „ — 1,- Ak ' ^ Z 



inf inf J]infP(a|(-l)'-ila:^^) = 2e + (l-2e)inf^g,' 



i=l 



which goes to 1 since for any j > 1, {g^ }i>i is a probabihty distribution. Since the 
p.s. is {t,p) with p{a\v) > e for any f G r, and since r has 1 as terminal string, it 
follows by Proposition 5.1 that it is a good p.s., with ^[0] satisfying P(^[0] < —n) = 



[1 — e)". By Theorem 4.1, the tail distribution of the CFTP is related to the the 

h l)P(f/o > af^ ), and using the expression 



tail distribution of 1 — At 



E 



obtained above for a^l, we obtain 



;i-2e)(l + e) 



i>i 



i=l 



In order to fix ideas, we will take, as suggested by De Santis & Piccioni ( |2012 ), 
Eti^l ~ 1 - (1 - /, A; > 1, with a > 0. We then have infi<j ^JL^ ~ 



1 - (1 - J 



-a\k+l 



and we observe that in this case, inf j>i = 2e for any k > 1, 



meaning that this kernel does not belong to ULC(r), and for any i, a\ < 1 for all 
k > 1, meaning that it does not belong to SLC(r) neither. In order to prove that a 



CFTP is feasible, De Santis & Piccioni (2012) use the assumptions that a < 1 and 
that + e > 1. In our case, using the fact that F{c:^^ = j) = (1 — e)''~^e, we obtain 



i-A.<$^(i-er'(i-r") 



a\k+l 



and it follows that this kernel is in the regime (ii) of Theorem 4.1 



smce 



- Ak) < J](l - ey-'f < oo , Va > 0. 



fc>0 



Observe that the restrictions that a < 1 and that Ooo + e > 1 do not appear here. 

Example 5.4. We now propose a simple extension of the AR processes which 
allows to choose different models according to the past we consider. It provides 
us with an example of kernel belonging to ULC{t). Assume that we have two 
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models parametrized by {'ip, {9n}n>o) and {ip, {On}n>o), the "standard model" and 
the "alternative model" respectively. Now, suppose that we choose the standard 
model when £(a) is odd and the alternative one otherwise. That is, 



P(l\a) 



^ (^0 + Y.k>i ^ka^k) , if ^a) is odd 
^ (^0 + Zlfe>i 4«-fc) , if ^{od is even. 

This model has a discontinuity at —1. To see this, observe that P(l|(— 1)1^,1) takes 
value ^ (^00 - J2i=i + En>fc+i ^n) (^0 " ELi + En>fc+i ' accordlug to 
k being odd or even, and therefore does not converge in k, as ip{6o) 7^ '^/'(^o)- Nev- 



ertheless, using similar calculations as in Example 5.1, we observe that this kernel 
belongs to LC(r), since takes value 1 — 2ip'{c) J2j>k+i l^il 2'?/''(c) J2j>k+i 
and converges to 1 as diverges. Since this quantity does not depend on i, it follows 
that ak := infj>i converges to 1 as well and therefore, P belongs to ULC(r). Us- 



ing Corollary |5 . 1 1 together with Proposition 5J_, we conclude that the rate at which 
1 — + 1)(1 — am) converges to zero controls the tail distribution of the coalescence 
time of the CFTP. 



Example 5.5. The following example is inspired in Gallo (2011). We let /i : N — >■ 

2N + 1 be non-decreasing and unbounded and for any v & A* with \v\ odd, we let 
Maj(u) denotes the symbol that most appears in v (Maj stands for majority here). 
Then, we put, for any a with £(a) = / > 1 

P(l|a) = Ql {Maj(a:;:;,(,)) = -1} + (1 - ei)l {Maj(a:;:;,(,)) = 1} 

where {e/};>o is a ]e, 1/2 — e[-valued sequence, with < e < 1/4. Put also P(l| — 1) > 
e. Here also, we can compute 

0Jk{-l) = 1 - sup \ei - em\, 

l,m>k 

which will not converge to 1 if and only if {ei}i>o does not converge. We assume 
therefore that this is the case, and since for any a such that £(a) = Z < +00 we have 
^k{Oi) = 1 for any k > I + h{l) + 1, we conclude that P is discontinuous and belongs 



to SLC(r). Applying Corollary 5.2 together with Proposition 5.1, we conclude that 
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the way + l)P(c° > h ^{m)) = + 1)(1 — e)'* ^^"^^ converges to controls the 
tail distribution of the coalescence time of the CFTP. For instance, when 



\ogh{k) 
hm sup — — - — < 1 , Ce 



logfl 



we get, as obtained by Theorem 1 in Gallo (2011), that the coalescence time of the 
CFTP has summable tail. 



6. Proof of Theorem 14.11 

6.1. The update and the length function 

Let us say, before going into any further details, that the update function we will use 
is the same (with some simple changes to make it suitable when P is not necessarily 



continuous) as the one used by Comets et al. (2002), and which underlies the works 



of Gallo (2011) and De Santis & Piccioni (2012). This update function is defined 



through the partition of [0, 1[ represented on Figure [T| where for any a and a, the 
intervals have length 

|/o(a|0)| = a_i(a) 

|/fc(a|al})| := inf P{a\aZ\ z) — inf P(a|all , z), \/k> 1, 
|/oo(a|a)| := P{a\a) — lim inf P(a|al^ ^), 

A;— ^-oo z 



The only difference with the partition used by Comets et al. (2002) (see Figure 1 



therein), is the addition of the intervals |/oo(a|a)|, due to the fact that when P is 
not assumed to be continuous, ujk{a\az\) may not converge to 1 for some pasts. 
With this partition in hands, the update function is simple to define: 

F{Uo,a) := ^al{Uo G Ufc>o UaeA hia\aZl)}. 

Algorithmically, this update function is practical for the following reason. If we 
introduce the length function 



L{Uo,a) := ^A;l{f/o G Uae^4(a|a_^)} + oo.l{f/o G UaeAlooia\a)} 



(22) 



A:>0 
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/o(l|0) 

/o(2|0) 



/i(l|a-i) 

/i(2|a_i) 

1 



l2{2\aZl) 



h{l\aZl) 



Ioo{l\aZl) 



h{2\aZl) 



++ 



Ioo{2\aZl) 



^i(a) 



^2 (a) 



Wfc(a) 



Figure 1: Illustration of the partition related to some infinite past a. 

we observe that, whenever L{Uo,a) < k < oo (this occurs when Uq < Wfc(a)) we 
have F{Uo,a) = F{Uo,baZl.) for any b. This means that the value of F{UQ,a) can 
be decided looking only at (at most) the k last symbols of the infinite past a. In 
other words, once the algorithm constructs k symbols al^, from, say, times n — k 
to time — 1, the construction of the next symbol is possible when L{Un,a) < k 
because this event only depends on azl- This is not only a "practical" advantage, 
but also a mathematical advantage to prove that 9[0] (defined as (g using this 
update function) is P-a.s. finite. 

Let us consider, for any — oo<m<'ri<+oo 

6'[m, n] := ma.x{i <m: for any a , L{aFi^ij_i^{a, U^^^), Uj) < j — i , j = i, . . . ,n} 

with the convention that 6[m\ := 6[m, m] and where -F{m,ri}(a, U^) denotes the whole 
constructed string based on the past a, that is, for any — oo < m < n < +oo 



F{m,n}{a, U"^) := F(a, Um)F[m,m+l]{a, U, 



m+l^ 



Lemma 6.1. ^'[0] < ^[0]. 



Proof We will prove that 9'[0] G {j < : -F{j,o}(ai, Uj) does not depend on a}. As- 
sume 9'[0] = —k > — oo. We proceed recursively. First observe that U-k G [0,a;_i[ 
since L(f/_fc,a) = for any a, and therefore, F^_f:_k}{a,U^k) = F{a,U^k) is 
obtained independently of a. Suppose (recursion hypothesis) that for some / G 
{1, . . . , A; — 1}, the whole string -F{-fc-i}(a, U:l) has been constructed independently 
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of a. Since L{aF{^k,-i}iQi,Uzl),U-i^i) < k — I + 1, it follows that the value of 
FiaF{^k~i}iQ,,Uzl),U-i-i-i) can be obtained independently of a, and thus, concate- 
nating, we obtain that the whole string „;+i}(a, f/^^^^) has been constructed 
independently of a, establishing the recursion from time —k to time 0. □ 



6.2. Definition of a block-rescaled coalescence time 



Lemma 6.1 indicates that we can focus on 6'[0] instead of 6[0]. However, the direct 



study of 6'[0] remains an intricate task, and our objective here is to introduce the 



coalescence time A[0], defined by equation (26), which is easier to study. We will 
nevertheless need to introduce a sequence of technical definitions in order to get to 
its definition. Let {9'^}k>-i be the sequence of r.v.'s defined by 9^^ := 1 and for any 

A; > 

and partition — N into disjoint blocks {Bk}k>Q where Bk = {0^, ■ ■ ■ ,9^^^ — 1}. 
We consider the sequence {Ci}iei defined for any i G Z by 

k>0 

and define for any A; > 

Lk := sup 0- (23) 

An important advantage of this block rescalling, that we will use later, is the fact that 
the sequence {Lk}k>o is i.i.d. This follows from the definition of good coalescence 
time, which implies that for any j G Bk, k > 0, such that Uj > we have 

c'-' = snp\cr{Y{Bi)l-')\<j-9', (24) 

a 

and therefore, Q for i & Bk can be determined using the array {Uj}j^Bk- 
Now, introduce the random variable 



e[0] = e[0](U) := sup{n <0: Li<i-n, i = n,...,0} (25) 
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which will play the role of a coalescence time of Bq, in the block rescaled sequence. 
We finally define the main random variable of interest for the proof of the theorem 

-e[o] 

A[0]=A[0](U):=-5^|5.|. (26) 



i=0 



We now state two important lemma. Lemma 6.2 ensures that A[0] is indeed a coa- 



lescence time for time 0, and Lemma 6.3 gives informations on its tail distribution. 



Lemma 6.2. A[0] < ^'[0]. 

Proof We will prove that A[0] belongs to the set of coalescence times {^ < : 
for any a, L{aF{ij_ij{a,U^^^),Uj) < j — = 0}. We proceed in three 

steps. 

Step 1. Observe that, for any A; > and any i E B^, Q = i — 0'' + l{Ui > 
(^^j^i l-Bfc+jlj , Wi E Bk and A; > 0, we can rewrite A[0] as max{i < m : 
Cj <j - i, 3 =i,---,n} 

Step 2. Since (i) |-Bfc| > 1, V/c > 0, (ii) for any i E Bk we have Lk > Q and (iii) 
for any i E B^ such that Ui > we have 

c;-l = sup|c.(r(a)^-f)|<^-^^ (27) 

a 

it follows that Q > ■= HUi > + 0- 

Step 3. We will prove that, for any realization u of the process U such that 
< k, we have L(aF^i_ki_iy(a,u\Zk)^''^i) ^ k for any a E A~^. This is trivial if 

ui < a-i, thus we assume that this is not the case. We have the following sequence 

of inequalities. First, by (13), supposing c^~^ = m 



\criaF{i_k,i-i}ia,Ui_,^))\ < c; 



J-i 



m. 



(28) 



This implies, together with (15), that, for any i > 



a. 



a7 < a 



and by (10) and ([2]), we have 



a 



Cr(aF{,_fc_,_i}(a,n,_J,)) 



<^i+|c.(aF{,_,,_i}(aX:i))|(«^0-M-l}fe«Lfc))- 
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Since < k, we have, using the above inequahties 



According to the partition defining F and L, this imphes that the length function 

L{aF{i_k,i-i}{a,u\zl),ui) < k - m + \cr{aF{i_k,i-i}{a,u\zl))\, and using one more 



time (28) concludes the proof of Step 3. 



We complete the proof of the lemma using steps 1, 2 and 3. 

□ 

Lemma 6.3 (Key-Lemma). Consider a kernel P belonging to LC(t) with good 
probabilistic skeleton {t,p), and let 9[0] be the corresponding good coalescence time. 

(i) //Efc>ini=(!P(^o <i) = +00, then A[0] is F-a.s. finite. 

(ii) //^[O] has summable tail and X^fc>oIP(-^o > < +oo, then A[0] has summable 
tail. 

(iii) // ^[0] has exponential tail and {P(-Lo > i)}k>o decays exponentially fast to 
zero, then A[0] has exponential tail. 

In particular, since A[0] < ^^'[0] < ^^[0], it follows that in each of these regimes, the 
same conclusion hold for all these coalescence times. 



Proof In order to control the tail distribution of A[0] (see (26)), we first need 
to control the tail distribution of 6[0]. Observe that 6[0] is defined over the block 
rescaled sequence using the i.i.d. sequence of r.v.'s {Lj}j>o exactly as the coalescence 



time of site was defined in Comets et al. (2002) (where it is denoted by r[0]). 



Indeed, since the Lj's are i.i.d., and since P(Ivo = 0) > P(|-Bo| = 1) = o^-i > 0, 



we can invoke Theorem 4.1 item (iv) together with Proposition 5.1 in Comets et al. 



(2002) and obtain the following assertions 



(|6.3 1) if Y.k>i nf=o P(^o <i) = +00, then e[0] is a.s. finite. 



(6.3 2) if {P(Lo > k)}k>o is summable, then B[0] has summable tail, 
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(6.3 3) if {P(-Lo > k)}k>o decays exponentially fast to 0, then 9[0] has exponential 
tail. 



Coming back to the definition (26) of A[0], we now prove items (i), (ii) and (iii) of 



the lemma using respectively items (6.31), (6.3,2) and (6.3 3) we just stated. Item 
(i) is direct since the sum of an a.s. finite number of random variables which are 
a.s. finite is a.s. finite. For item (ii),we observe that 



n-l 



-nE\Bo 



i=0 

is a martingale with respect to the filtration J^{{Lq, \Bq\), . . . , \B_i\) : i > 0). 
Moreover, B[0] is a stopping time with respect to the same filtration, this follows 
directly from the definitions of 0[O]. Thus, by the Optional Stopping Theorem 

/-e[o] 

EA[0] = E ^ 



i=0 



= E|fio|-E|e[0]|, 

which is finite by item ( |6.3 2) above, in the conditions of item (ii) of the key-lemma. 



For the proof of item (iii), we use the proof of item (iv) of Lemma 14 in Harvey 



et al. (2007). Let p := (2E|5o|)-^ and compute 



P(A[0] < -n) = ^P(e[0] = -i,^\B^ \ > n) 

j=0 



i>0 



[p.nj 

<J2^{e[o] = -^,Y,\B^>n]+ Yl nm 

j=0 J j>[p.nj+l 



i=0 



< [p.n\F 



[p.nJ 

\Bj\ - [p.n\E\Bo 



j=0 



> n/2 +P(e[0] < - [p.n\) (29) 



Because |i?o| has exponential tail, a standard result of large deviation (see for ex- 



ample Corollary 27.1 in Kallenberg (2002)) shows that the first term in the last line 
decays exponentially in n. This proves item (iii) using item (6.3 3) above. □ 
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6. 3. Proof of Theorem 4-i 



Using Lemmas 6.1, 6.2 and 6.3, it remains to prove that < ^{Lq < m), m > 0. 
For any i > 



P(Lo > t) =P(sup C, >t)=nY. > > 1) < E( X] HQ > (30) 

^^^^ j=m j=e[o] 

We have all the ingredient for applying the Wald inequality: 

• By translation invariance, we have = E^q for any ^ ^ ^■ 

• By our assumptions, IE|^[0]| < oo. 

• Finally, for any n > 1, 

E(l{C-„ > t}.l{m < -n}) = El{C-„ > ^} - E(l{C-„ > t}.l{m > -n}) 

= El{Co >i}- El{C-n > i}.E{l{9[0] > -n}) 
= El{Co > «}. [1 - P(^[0] > -n)] 
= P(Co >0-P(^[0] < -n) 

where, for the second line, we used the fact that {^[0] > — n} is measurable 
with respect to J-'(f/°„_^J, while l{C-n > i} is J-'(f/r(^)-measurable. 

Thanks to all these facts, we can use Wald's equality, and obtain 

p(Lo >t)< E{\m\ + i)p(Co >^) = mm\ + ^muo > «■"), 

and since for any i > we have F{Lq < i) > ^{Lq = 0) > the proof of the theo- 
rem is concluded using Afc := {l - (IE|^[0]| + l)P(f/o > af ')|va_i. □ 

7. Relaxing the local continuity assumption 

In this section we propose an extension of the notion of local continuity. Local conti- 
nuity corresponds to assume that there exists a stopping time for the reversed-time 
process, beyond which the decay of the dependence on the past occurs uniformly. 
Removing this assumption means that no stopping time can tell whether or not the 
past we consider is a continuity point for P. This idea is now formalized in the 
following definition. 
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Definition 7.1. We will say that a kernel P belongs to the class of extended locally 
continuous with respect to some skeleton r if for any v G ^°°r 

a^. := inf inf ... inf \^ inf P{a\v f i f 2 . . . Vk z) ^^I^ x_ (^31) 

We will denote this class by extLC(r). The probabilistic skeleton (p.s.) of P is the 
pair {t,p) where p := {p{a\v)}a&A,v&<^T, 

p{a\v) := ini P{a\vz) (32) 
2 

and p{a\v) = P{a\v) for any v G °°r. 

Theorem 7.1. Assume that P belongs to extLCfr) with good probabilistic skeleton. 
Assume furthermore that the good coalescence time 6[0] satisfies that < j — 6[0] 
for any j G {^[0], . . . , 0}. For any k > 0, denote 

Ak := |l - (E|^[0]| + l)F{Uo > af)] V A; > 0. 

Then, we can construct for P, an update function F and a corresponding coalescence 
time 9 such that 

(i) //" X]a:>i n^=o ^fc — +00, then 6'[0] is P-a.s. finite. 

(ii) //X]fc>o(-'- ^ < +00, then 9[Q\ has summable tail. 

(iii) If 0[Q] has exponential tail and {1 — Ak}k>o decays exponentially fast to zero, 
then ^[0] has exponential tail. 

In particular, in each of these regimes, the CFTP with update function F is feasible. 

Observe that the assumption that c{ < j — 0[0] for any j G {^[0],...,0} is a 
bit stronger than simply assuming that ^[0] is good since this later assumption 
only implies that < j — ^[0] for the time indexes j G {^[0], . . . , —1} such that 
1^+1 = -k. Nevertheless, for skeletons having terminal context w, 0[O] := max{i < 



— \w\ + 1 : 1^*'^'"'' ^ = w} used in Proposition 5.1 satisfies this stronger assump- 



tion. For skeletons r'^ as considered in Proposition |5.2[ the good coalescence time 
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6'[0] := min{i > : Ylij=-i ^(0)i > cr} also satisfies the stronger assumption. We 



now give the proof of this theorem, and then give two examples on A = { — 1,+1}, 
and an apphcation for creation of new skeleton that can be used for Theorem 4.1[ 



Proof The proof of this theorem follows exactly the same steps as the proof of 



Theorem |4.1[ To avoid repetition, we just outline the key observation and leave 
the proof to the reader. Observe that, if at time i, the random variable Li takes 
value /, this means that we have to look at a portion of the past preceding Bi which 
contains / concatenated contexts of r. But in the conditions of the theorem, the 
blocks themselves are concatenation of contexts of r, then, it is still true that we 
have at least / concatenated contexts contained in the / blocks preceding B^. 

Therefore, if = a^, the number of blocks involved in both procedures is the same. 
Thus, the random variable A[0] obtained with the new decomposition has essentially 
the same tail distribution as A[0]. 



Example 7.1. For any a G A~^, let 

/C(a) := inf{i > 1 : a_fc = a_fc_i, k > i} 
with the convention that /C(a) = +00 if the set is empty. Now let P be defined by 

P(l\a) = e+ ^ 



where / is an unbounded increasing integer valued function satisfying /(I) > jz2i- 
This way we get inf^ P{—l\a) = 1 — e — > e. We will show that this kernel has 
discontinuities at every point having either finitely many — I's or finitely many I's. 
Let S denote the set of such points and take aES. We have, for any z G A^^ \ S, 
P{M^-k^) — ^ for any k and therefore, does not converge to e + On the other 
hand, the kernel is continuous at any a G A"^ \ since P(l\a~lz) = e + ^, ii , 
which always converge to e because lC{aZ\z) converges to +00 (or equals infinity 
whenz G 

The main point is that the integer /C(a) can never be checked looking at a finite 
portion of the past, while both, £(a) and Tfj{a) (used to define the kernels of Section 
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5.3 ) can be checked looking at a finite portion of tlie past of a. This means that no 
matter how much we know of a, we never make sure that it is indeed continuous 
past for P. We now explain why this new kernel satisfies the conditions of Theorem 



7.1 First, we observe that it belongs to extLC(r) with 

^ = {-i}uU{i(-inu{i}uU{-ii^} 

i>0 i>0 

since for any A; > and any set {v, vi, . . . , Vk} of elements of f, we have 

Einf P{a\v t>i t>2 . . . Vkz) = inf P(l|t> V1V2 ■ ■ ■ Vkz) + 1 — sup P{l\v f 1 f 2 . . . Vk z) 
z z _ 

aeA " " - 

1 



/[l^l + Elil^.|] 

and therefore, inf^g^ al = 1 — f^^2k+2) which converges to 1 as A; diverges. Also, 
observe that the p.s. {t,p) satisfies p{a\v) > e for any a G A and v & f, and 



therefore, fits the conditions of Proposition 5.1, because r has (—1)1 and 1(— 1) as 
terminal strings. It follows that IE|^[0]| < 1/e^ and thus 

^>1-(1+^^ ^ 



e2 7(2A; + 2) 

According to the function / we choose, we can have the three regimes of CFTP 



specified by Theorem 7.1 



Example 7.2. This example is taken from De Santis & Piccioni (2012) (Example 1 
therein). It is defined using a sequence of real numbers {6'„}„>i such that J2k>i < 
1/2, a continuous function / : IR+ — > [0, 1] decreasing to 0, and the quantity 

fc-i 

SkiaZl) = ^ l{a-i 7^ a_i_i}, 

i=l 

which counts the number of changes of signal in al^. For any a G A~^, let 

P{l\a) = l/2 + ^^,a_fc/(^,(a:^)). 
fc>i 

Observe that this kernel is somewhat similar to the AR kernel (introduced in Section 



5.3.1) with ip = ld and 6q = 1/2, with the difference that each occurrences of signal 
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changes in the past reduces the dependence due to the multiphcative term in the 
sum. The same calculation as in the case of AR processes yields 



Uk{a) = l~2f{/3Sk{aZl)) ^ |^,|. 

i>k+l 



This kernel is therefore continuous as for the AR process, with the difference that 
instead of a multiplicative term ip'(c{aZl.)) which is bounded away from and +00 



(see (20)), we now have a term f{PSk{aZ\)) which goes to zero for the pasts a having 
infinitely many changes of sign. In other words, these pasts have a faster continuity 
rate. Uniform continuity may not be useful because it amounts to take into account 
only the sequence {X]j>fc+i l^«l}fc>o which may converge too slow to zero. This kernel 



is also an example in which the notion of local continuity of Theorem |4.1| cannot 
be used, since we cannot make sure whether a given past has a finite or infinite 
number of sign changes looking only at a finite portion. However, it belongs to 
extLC(f), since, for any > and any set {f , t>i, . . . , Vk] of elements of f, denoting 



/ = 1^1 + Ell 



Einf P{a\vvi V2 ■ ■ ■ z) = inf P{l\vvi V2 ■ ■ ■ Vkz) + 1 — snp P{l\vvi V2 ■ ■ ■ v^z) 
z z ^ 

aeA " " - 

= l-2f{^Si{aZl)) J2 l^^l- 

i>l+l 

Notice that we have Si{azl)) > k, since there is at least k sign changes in the 
concatenated string Vk ■ ■ ■ Viv. Thus, we obtain 



1) f * 



j>2fc+3 

and the same calculations as in the preceding example yields 



Here also, according to the function / we choose, we can have the three regimes 



of CFTP specified by Theorem 7.1 For instance, the special case of f{x) = e~^^, 
P > 0,x > yields an exponential tail for the coalescence time of the CFTP (regime 
(iii) of our theorem). 
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Application of Theorem 7.1 By analogy with Definition 5^ we can define the 
class of kernels that satisfy the extended strong local continuity with respect to some 
skeleton r (let us denote extSLC(r)). These are such that for any v G ^°°r, there 
exists a positive integer h{v) such that for any k > h{v), a1 = 1. The interesting 
point is that such kernels, when they satisfy the conditions of items (ii) or (iii) of 



Theorem 7.1, are good p.s.'s themselves. 

In other words, we have somehow a self-feeding argument, which allows us to con- 



struct more complicated good p.s.'s from simpler ones, using Theorem 7_A. And if 
we can show that P is a good p.s., it can be used as such in Theorem |4.1[ 

We now explain why the kernels P belonging to extSLC(r) and satisfying the as- 



sumptions of Theorem 7.1 are indeed good (see Definition 3.4). First, we observe 



A:>0 



This inequality means that we can use {Ci}iez to obtain another coalescent time 
A[0], defined in the same way A[0] is defined using ([0] (that is, just as we did in 



the proof of Theorem 4.1). Clearly, we will have A[0] < A[0], moreover A[0] is a 
good coalescence time (see Definition 3.4) because Q is J-'(Y'i^)-measmah\e. The 
fact that P is in fact a good p.s. follows now from the fact that A[0] has finite 



expectation under the assumptions of items (ii) and (iii) Theorem 7.1 



8. Concluding remarks 

There are several results that follow from the existence of a CFTP scheme and the 
regenerative structures. Among them, bounds for the (i-distance, rate of decay of 
correlations, concentration inequalities and Functional Central Limit Theorem. 

• Bounds for the rf-distance. Given a finite sample, it is natural to use a 
Markov approximation whose transition probabilities can be estimated from a 
sample of the infinite-order chain. A natural candidate would be the canonical 
k^^-order approximation, which is obtained by cutting off the memory after 
k steps. Bounds on the ^-distance can be used to characterize the rate of 
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convergence of estimators for stationary processes that can be approximated 
by /c-steps Markov chains (see Csiszar & Talata ( 2010[ )). Gallo et al. (2011) 
use our perfect simulation scheme to derive new bounds for the d-distance 
between the original chain and its canonical fc-steps Markov approximation. 



Loss of memory and decay of correlations. CFTP allows to directly 
obtain explicit upper bounds for the speed of the loss of memory of the chain, 



as it has been showed in Comets et al. (2002). On the other hand, it is known 



that the decay of correlations is bounded above by this speed (see Remark 



6.2.2 of Maillard (2007) for instance). Roughly speaking, this means that 
both, decay of correlations and speed of loss of memory are controlled by the 
tail distribution of the coalescence time of the CFTP. 

Concentration of measures. Another direct application of CFTP is the 
following result on concentration of measures, proved in Gallo & Takahashi 



(2011) 



Proposition 8.1. Let ^ be a process that can be simulated by a CFTP al- 
gorithm with a coalescence time 9. If ^[9] < oo, then for all e > and all 
functions f : ^ we have 



P(|/(Xr)-E[/(Xr)]|>e)<2exp 



2e' 



'1 



Functional Central Limit Theorem. Under assumptions ensuring that 
the coalescence time of time has summable tail (basically the number of 
steps that have to be performed by the algorithm in order to construct the 
stationary chain at time 0), the constructed chain has a regeneration scheme. 
Such structure has been already observed under stronger assumptions using 



continuity, by for example, Lalley (1986), but also more recently in Comets 



et al. (2002) and Gallo (2011). It is worth mentioning that using this fact, a 



Functional Central Limit Theorem could be derived for our chains, as it has 



been done by Maillard & Schopfer (2008) under the continuity assumption. 
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This is because, looking at their proof, we observe that it uses the regeneration 
property of the measure, and not the form of the conditional probabilities of 
the chain. 
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